Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft of Keras pickle RFC #1

Merged
merged 26 commits into from
Sep 21, 2020
Merged

Draft of Keras pickle RFC #1

merged 26 commits into from
Sep 21, 2020

Conversation

stsievert
Copy link
Collaborator

What does this PR implement?
A draft of the Keras pickle edits. I'm submitting this PR because I want to discuss these changes and don't want add a bunch of unnecessary comments/notifications/reviews that the TF maintainers have to sift through.

This PR should be merged when the first draft is done.

Reference issues/PRs
This is a draft for tensorflow#286

@adriangb
Copy link
Owner

adriangb commented Sep 2, 2020

I'm sorry that I just saw this and in the meantime made edits to the original. My apologies. I will try to incorporate your version into mine.

@adriangb
Copy link
Owner

adriangb commented Sep 2, 2020

Sorry for rewriting the history @stsievert . I incorporated what you wrote into what I had. We can continue working here until we have something more final that we can upstream.

@adriangb
Copy link
Owner

adriangb commented Sep 3, 2020

I think I'd like to also include a monkey patched example for using Model.save, using a string to transfer back and forth, just as a POC.

@adriangb
Copy link
Owner

adriangb commented Sep 3, 2020

POC for using SaveModel as a backend for pickle is up in the notebook: https://colab.research.google.com/drive/1SEGXDFfNl5i0Cy8a4xvRjWOOjdq3sOOC?authuser=1#scrollTo=VZIkAZyJE1CP

@adriangb
Copy link
Owner

adriangb commented Sep 3, 2020

Anything else you think we need to add @stsievert?

@stsievert
Copy link
Collaborator Author

stsievert commented Sep 3, 2020 via email

@adriangb
Copy link
Owner

adriangb commented Sep 3, 2020

Okay, I'll let you do the next round of edits before I make any more changes.

@stsievert
Copy link
Collaborator Author

I made some edits to the motivation and user benefits section. I'd still like to review the technical content.

@adriangb
Copy link
Owner

adriangb commented Sep 3, 2020

Edits look great. And thanks for fixing the line lengths.

@stsievert
Copy link
Collaborator Author

stsievert commented Sep 3, 2020

Why implement __reduce_ex__? __reduce_ex__ only exists to support different Pickle versions (docs). AFAICK the argument protocol isn't used. Why not __reduce__?

I'd still like to make a couple more edits to this RFC. I'll try to make the edits later tomorrow. Feel free to make some edits; I revised the technical section.

@adriangb
Copy link
Owner

adriangb commented Sep 3, 2020

Two main reasons:

  1. I recall running into issues using just __reduce__ when I first tried that. I think it was breaking some test that implemented it in subclassed models, but I do not remember tbh. Might be worth trying again.
  2. I think it's likely that they will want to use that protocol version in the future, ex: to map it to different versions of the SaveModel protocol.

@stsievert
Copy link
Collaborator Author

stsievert commented Sep 4, 2020

@mrocklin we're proposing to add Pickle support to Tensorflow. Could review our proposal? It's at 20200902-pickle-for-keras.md. I'd appreciate any of your thoughts, including any ideas on other problems Pickle support would solve for Keras.

In this, I lifted some wording about ecosystem support from your post "Pickle isn't slow, it's a protocol." Here's the relevant paragraph:

Supporting Pickle will enable wider usage in the Python ecosystem because Python's ecosystems of libraries depend strongly on the presence of protocols. Without these protocols, it's necessary for each library to implement a custom serialization method for every other library. For example, Dask Distributed has a custom serialization method for Keras at distributed/protocol/keras.py. See "Pickle isn't slow, it's a protocol" for more detail (notably, this post focuses on having an efficient Pickle implementation for PyTorch).

Let me know if I should change anything in that paragraph.

@mrocklin
Copy link

mrocklin commented Sep 4, 2020

I'm happy to see this work. I'm a bit slammed at the moment, but maybe instead of reviewing I could try to conscript @jakirkham ? He has been thinking a lot about serialization recently, and I think that it would be good to have someone from the RAPIDS team engaged regardless.

@jakirkham
Copy link

Would suggest leveraging pickle protocol 5 where possible to avoid copying frames before sending them over the wire. This is something Dask ( dask/distributed#3784 ) is able to leverage for zero-copy transmission.

@jakirkham
Copy link

Yep, NumPy also supports the pickle5 backport package. So if that's available, it will work for Python 3.5+.

Exactly this is why we do that as well. NumPy is a common dependency. So most people have it as well.

@jakirkham
Copy link

cc @quasiben (for vis)

@adriangb
Copy link
Owner

Hi folks, let's make a push to get this wrapped up this weekend so we can submit the PR in the TF repo sometime next week and start to get some feedback on that side. Please let me know if you see anything that would be a blocker to this.

@stsievert
Copy link
Collaborator Author

Doesn't the implementation rely on NumPy's Pickle 5 support? So can't __reduce_ex__ be replaced with __reduce__?

I've deleted the metric tests. It's a minimal implementation. I think it's obvious that the tests will pass with NewMetric.

from tf.keras.models import load_model

class Model:
...
class NewModel(Model):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious as to why the name change? I was envisioning these code blocks as representing "here's a pseudocode of what this would look like if implemented in TF" and not necessarily "here's how users can create a picklable Model" which is the first thought that came to mind when I saw NewModel(Model).

@adriangb
Copy link
Owner

adriangb commented Sep 19, 2020

Doesn't the implementation rely on NumPy's Pickle 5 support? So can't __reduce_ex__ be replaced with __reduce__?

I was thinking that, but I wasn't 100% sure and I also had the thought that (1) this is a private ecosystem method, not something users will really be looking at so how nice the name looks doesn't really matter and (2) __reduce_ex__ will allow more flexibility if anyone wants to use the pickle version within the method in the future.

I've deleted the metric tests. It's a minimal implementation. I think it's obvious that the tests will pass with NewMetric.

👍

@stsievert
Copy link
Collaborator Author

(1) this is a private ecosystem method, not something users will really be looking at so how nice the name looks doesn't really matter

I'm concerned with maintenance cost. Will some developer waste an hour looking into the difference between __reduce__ and __reduce_ex__ because the protocol argument isn't used?

(2) __reduce_ex__ will allow more flexibility if anyone wants to use the pickle version within the method in the future.

I don't think "more flexibility" is relevant; I think __reduce__ is as flexible as __reduce_ex__ because because the protocol argument isn't used in this implementation.

@adriangb
Copy link
Owner

I do see you point, but I think there's arguments either way. If you feel strongly about it and are 100% sure that we can use __reduce__ and still get the full benefits of protocol 5 support because we offload to numpy, then by all means switch it back, I do not intend to hold this back for something that minor.

@adriangb
Copy link
Owner

I think we are pretty much good to go.

@stsievert, do you want me to change __reduce_ex__ back to __reduce__ before submitting this?

@stsievert
Copy link
Collaborator Author

change __reduce_ex__ back to __reduce__

I think @jakirkham is the right person to ask.

I'm pretty sure zero-copy transmission remains possible with __reduce__, but am not absolutely certain. The documentation in PEP 574 seems to indicate that zero-copy transmission remains possible with __reduce__:

We see that several conditions are required for the above to work:

  • __reduce__ or __reduce_ex__ must be able to return something that indicates a serializable no-copy buffer view.
  • ...

@adriangb
Copy link
Owner

Thank you, will wait on their response and then send this out.

@jakirkham
Copy link

Yeah it shouldn't matter if we are just using NumPy to handle out-of-band pickling.

@adriangb
Copy link
Owner

Thank you all for your contributions. I'm going to merge this and move this to the TF repo.

@adriangb adriangb merged commit 62d98dc into keras-pickle Sep 21, 2020
@adriangb adriangb deleted the keras-pickle-edits branch September 21, 2020 18:55
@adriangb
Copy link
Owner

@stsievert can you sign the Google CLA to fix tensorflow#286 (comment) if you want to keep authorship of your commits?

@stsievert
Copy link
Collaborator Author

I don't mind losing Git authorship. Feel free to remove it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants